Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

Closed
Umar17 opened this issue Jun 14, 2020 · 7 comments
Closed

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

Umar17 opened this issue Jun 14, 2020 · 7 comments

Comments

@Umar17
Copy link

Umar17 commented Jun 14, 2020

Hi,

I just experimented online decoding with online2-tcp-nnet3-decoder-faster which was being done using kaldinnet2onlinedecoder (through kaldi-gstreamer-server) earlier. I experienced about 3 times faster decoding with online2-tcp-nnet3-decoder-faster. I went through codes of both decoders and realized that the working is fairly identical. Can you please guide why is the later is faster? Is it my mistake or something else?

PS: parameters (like beam, lattice beam and maximum-active) kept identitical for both decoders.

Best Regards
Umar

@alumae
Copy link
Owner

alumae commented Jun 15, 2020

Probably you are using chain models and are missing the attribute frame-subsampling-factor: 3 under the decoder conf in the YAML file.

@Umar17
Copy link
Author

Umar17 commented Jun 15, 2020

Yes I am using chain file but frame-subsampling-factor option is in place. Attached is my yaml file.


use-nnet2: True
decoder:
use-threaded-decoder: True
nnet-mode : 3
model : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/final.mdl
word-syms : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/words.txt
fst : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/HCLG.fst
mfcc-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/mfcc.conf
ivector-extraction-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/ivector_extractor.conf
max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 1.0
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.01
chunk-length-in-secs: 0.25
frame-subsampling-factor: 3
num-nbest: 10
#Additional functionality that you can play with:
#lm-fst: test/models/english/librispeech_nnet_a_online/G.fst
#big-lm-const-arpa: test/models/english/librispeech_nnet_a_online/G.carpa
phone-syms: /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/phones.txt
#word-boundary-file: test/models/english/librispeech_nnet_a_online/word_boundary.int
#do-phone-alignment: true
out-dir: tmp/urdu

use-vad: False
silence-timeout: 60

post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

logging:
version : 1
disable_existing_loggers: False
formatters:
simpleFormater:
format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S'
handlers:
console:
class: logging.StreamHandler
formatter: simpleFormater
level: DEBUG
root:
level: DEBUG
handlers: [console]


And the client command is this:
python kaldigstserver/client.py -r 32000 c2a.wav
where sample wave file is sampled at 16KHz.

@Umar17
Copy link
Author

Umar17 commented Jun 15, 2020

I have tweaked frame-subsampling-factor and ironically it is not putting any effect on latency

@alumae
Copy link
Owner

alumae commented Jun 15, 2020

Can you give some numbers -- the actual difference in decoding time that you are seeing?

I assume you understand that -r 32000 option in client.py means that the audio is sent to the server using this byte rate. If the wav is indeed using 16 kHz 16-bit encoding, then the decoding cannot be completed faster than realtime, as the audio is sent to the server using a rate that simulates realtime recording from the mic.

@Umar17
Copy link
Author

Umar17 commented Jun 15, 2020

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343

Yes, I understand the byte rate and I experimented with -r 256000 as well which should send the whole audio within first second (the intuition is to imitate client for online2-tcp-nnet3-decode-faster that feeds whole audio and half-shutdown socket connection). It doesn't effect accuracy and improves efficiency a bit.

@alumae
Copy link
Owner

alumae commented Jun 15, 2020

Try changing to traceback-period-in-secs: 0.25.

@Umar17
Copy link
Author

Umar17 commented Jun 17, 2020

Tried but no effect. However, average of multiple experiments gives a difference of ~1 second in latency with r -256000 and tcp decoder.
I think the latency increases in gstreamer case due to server-worker-decoder architecture and communication goes slow than in case of online2-tcp-nnet3-decode-faster server.
If it is so, this issue can be closed.

@alumae alumae closed this as completed Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants