You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
-Sir I didn't integrete for custom en-in Acoustic Model(Adapting the default acoustic model-Indian English) and custom Language Model.
i was download a acoustic model from this link:
i follow the instruction this link: https://cmusphinx.github.io/wiki/tutorialam/
cp -a /usr/local/libexec/sphinxtrain/bw .
cp -a /usr/local/libexec/sphinxtrain/mk_s2sendump .
cp -a /usr/local/libexec/sphinxtrain/map_adapt .
cp -a /usr/local/libexec/sphinxtrain/mllr_solve .
it is working but not predicting a particular words. words is relevant to banking sectors.so i build again own language model using language model build tool (Building a simple language model using a web service)
own language model: lm.dict & lm.bin:
transcript file: own_vocab.txt
it was not return any words. and it shows error. phone words dict in the LM not present in the AM.
INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored
but some how i identify the issue. what it is, phone words(own.dict) EH, EY, AH, AE always presents in the en_in acoustic model(INDIAN ENGLISH mdef phones) also but it is in SMALL CASE.(en_in/ mdef file).
BUT OTHER ENGLISH mdef phones like wsj_all_cd30.mllt_cd_cont_4000, hub4_cd_continuous_8gau_1s_c_d_dd,
Columns definitions
#base lft rt p attrib tmat ... state id's ...
SIL - - - filler 0 0 1 2 N
UNK - - - n/a 1 3 4 5 N
aa - - - n/a 2 6 7 8 N
ae - - - n/a 3 9 10 11 N
ah - - - n/a 4 12 13 14 N
i tried something own.dic phones into small case but it was not reflect both AM & LM.
Basically that LM tool gives these kind of structure words and phones. it is affecting acoustic model model. these two not sync.
i tried another way something to create a own.lm.bin & own.dic also
Build an other way LM:
text2wfreq < own_vocab.txt | wfreq2vocab > own_vocab.tmp.vocab
it is working fine to predicting a particular words but that confusion is,
which acoustic model is combined to run on that command "pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt"
but i integrete these two AM and LM, and run on,
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt -hmm en_in_own
Again it was return the same error. it was not display any text. the error log is,
INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored
LM tool produced dict(word-phone) format:
A AH
A(2) EY
ABLE EY B AH L
ABOUT AH B AW T
ABSOLUTELY AE B S AH L UW T L IY
LM g2p-seq2seq produced dict(word-phone) format:
s EH S
s EH S
a EY
able EY B AH L
about AH B AW T
absolutely AE B S AH L UW T L IY
en_in_own mdef phones structure:
ia f aa s n/a 20 2023 2038 2063 N
ia f ae e n/a 20 2023 2038 2063 N
ia f ae s n/a 20 2023 2038 2063 N
ia f ah e n/a 20 2023 2038 2063 N
ia f ah s n/a 20 2023 2038 2063 N
ia f ao e n/a 20 2023 2038 2063 N
ia f ao s n/a 20 2023 2038 2063 N
ia f aw e n/a 20 2023 2038 2063 N
really is those small case was an issue or not? i was not able to predict this issue.
Sir How can i fix this issue?
i didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored
OS: Linux with version 16.04
Python3:
Sphinx version:
PocketSphinx 5prealpha
The text was updated successfully, but these errors were encountered:
it is the seq2seq g2p model.not mension these particularly but what is meant by Indian English g2p model with seq2seq.
how can i train these indian English g2p model. i am really confused sir.
we take the wordlist is a text file with one word per line----> own_vocab.tmp.vocab
and run a below program, **g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic**
got own.dict i know these only sir.
how can i train Indian English g2p model with seq2seq. sir can you explain me sir? @nshmyrev thank you so much sir.
Problem
-Sir I didn't integrete for custom en-in Acoustic Model(Adapting the default acoustic model-Indian English) and custom Language Model.
i was download a acoustic model from this link:
i follow the instruction this link: https://cmusphinx.github.io/wiki/tutorialam/
sphinx_fe -argfile en_in/feat.params -samprate 16000 -c audio.fileids -di . -do . -ei wav -eo mfc -mswav yes
pocketsphinx_mdef_convert -text en_in/mdef en_in/mdef.txt
cp -a /usr/local/libexec/sphinxtrain/bw .
cp -a /usr/local/libexec/sphinxtrain/mk_s2sendump .
cp -a /usr/local/libexec/sphinxtrain/map_adapt .
cp -a /usr/local/libexec/sphinxtrain/mllr_solve .
./bw
-hmmdir en_in
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-feat 1s_c_d_dd
-cmn current
-agc none
-dictfn en_in.dic
-ctlfn audio.fileids
-lsnfn audio.transcription
-accumdir .
./mllr_solve
-meanfn en_in/means
-varfn en_in/variances
-outmllrfn mllr_matrix -accumdir .
cp -a en_in en_in_own
./map_adapt
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-meanfn en_in/means
-varfn en_in/variances
-mixwfn en_in/mixture_weights
-tmatfn en_in/transition_matrices
-accumdir .
-mapmeanfn en_in_own/means
-mapvarfn en_in_own/variances
-mapmixwfn en_in_own/mixture_weights
-maptmatfn en_in_own/transition_matrices
./mk_s2sendump
-pocketsphinx yes
-moddeffn en_in_own/mdef.txt
-mixwfn en_in_own/mixture_weights
-sendumpfn en_in_own/sendump
pocketsphinx_continuous -hmm en_in_own -lm en-us.lm.bin -dict en_in.dic -infile 38.wav > 4.txt
it is working but not predicting a particular words. words is relevant to banking sectors.so i build again own language model using language model build tool (Building a simple language model using a web service)
own language model: lm.dict & lm.bin:
transcript file: own_vocab.txt
sphinx_lm_convert -i own.lm -o own.lm.bin
sphinx_lm_convert -i own.lm.bin -ifmt bin -o own.lm -ofmt arpa
pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic
sir, it is working fine. detecting that particular words. but one confusion,
but i integrete these two AM and LM, and run on,
pocketsphinx_continuous -hmm en_in_own -lm own.lm.bin -dict own.dic -infile 1.wav > result_own.txt
it was not return any words. and it shows error. phone words dict in the LM not present in the AM.
INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored
but some how i identify the issue. what it is, phone words(own.dict) EH, EY, AH, AE always presents in the en_in acoustic model(INDIAN ENGLISH mdef phones) also but it is in SMALL CASE.(en_in/ mdef file).
BUT OTHER ENGLISH mdef phones like wsj_all_cd30.mllt_cd_cont_4000, hub4_cd_continuous_8gau_1s_c_d_dd,
Columns definitions
#base lft rt p attrib tmat ... state id's ...
SIL - - - filler 0 0 1 2 N
UNK - - - n/a 1 3 4 5 N
aa - - - n/a 2 6 7 8 N
ae - - - n/a 3 9 10 11 N
ah - - - n/a 4 12 13 14 N
i tried something own.dic phones into small case but it was not reflect both AM & LM.
Basically that LM tool gives these kind of structure words and phones. it is affecting acoustic model model. these two not sync.
i tried another way something to create a own.lm.bin & own.dic also
Build an other way LM:
text2wfreq < own_vocab.txt | wfreq2vocab > own_vocab.tmp.vocab
text2idngram -vocab own_vocab.tmp.vocab -idngram own_vocab.idngram < own_vocab.txt
idngram2lm -vocab_type 0 -idngram own_vocab.idngram -vocab own_vocab.tmp.vocab -arpa own.lm
sphinx_lm_convert -i own.lm -o own.lm.bin
Build a own.dic an other way:
i was followed these link: &
g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir g2p-seq2seq/g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt
it is working fine to predicting a particular words but that confusion is,
but i integrete these two AM and LM, and run on,
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt -hmm en_in_own
Again it was return the same error. it was not display any text. the error log is,
INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored
LM tool produced dict(word-phone) format:
A AH
A(2) EY
ABLE EY B AH L
ABOUT AH B AW T
ABSOLUTELY AE B S AH L UW T L IY
LM g2p-seq2seq produced dict(word-phone) format:
s EH S
s EH S
a EY
able EY B AH L
about AH B AW T
absolutely AE B S AH L UW T L IY
en_in_own mdef phones structure:
ia f aa s n/a 20 2023 2038 2063 N
ia f ae e n/a 20 2023 2038 2063 N
ia f ae s n/a 20 2023 2038 2063 N
ia f ah e n/a 20 2023 2038 2063 N
ia f ah s n/a 20 2023 2038 2063 N
ia f ao e n/a 20 2023 2038 2063 N
ia f ao s n/a 20 2023 2038 2063 N
ia f aw e n/a 20 2023 2038 2063 N
really is those small case was an issue or not? i was not able to predict this issue.
Sir How can i fix this issue?
i didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored
PocketSphinx 5prealpha
The text was updated successfully, but these errors were encountered: