gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

Umar17 · 2020-06-14T12:11:10Z

Hi,

I just experimented online decoding with online2-tcp-nnet3-decoder-faster which was being done using kaldinnet2onlinedecoder (through kaldi-gstreamer-server) earlier. I experienced about 3 times faster decoding with online2-tcp-nnet3-decoder-faster. I went through codes of both decoders and realized that the working is fairly identical. Can you please guide why is the later is faster? Is it my mistake or something else?

PS: parameters (like beam, lattice beam and maximum-active) kept identitical for both decoders.

Best Regards
Umar

alumae · 2020-06-15T07:43:16Z

Probably you are using chain models and are missing the attribute frame-subsampling-factor: 3 under the decoder conf in the YAML file.

Umar17 · 2020-06-15T08:09:30Z

Yes I am using chain file but frame-subsampling-factor option is in place. Attached is my yaml file.

use-nnet2: True
decoder:
use-threaded-decoder: True
nnet-mode : 3
model : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/final.mdl
word-syms : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/words.txt
fst : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/HCLG.fst
mfcc-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/mfcc.conf
ivector-extraction-config : /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/conf/ivector_extractor.conf
max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 1.0
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.01
chunk-length-in-secs: 0.25
frame-subsampling-factor: 3
num-nbest: 10
#Additional functionality that you can play with:
#lm-fst: test/models/english/librispeech_nnet_a_online/G.fst
#big-lm-const-arpa: test/models/english/librispeech_nnet_a_online/G.carpa
phone-syms: /home/cle-26/Downloads/kaldi-gstreamer-server-master/nnet3_chain/phones.txt
#word-boundary-file: test/models/english/librispeech_nnet_a_online/word_boundary.int
#do-phone-alignment: true
out-dir: tmp/urdu

use-vad: False
silence-timeout: 60

post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

logging:
version : 1
disable_existing_loggers: False
formatters:
simpleFormater:
format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
datefmt: '%Y-%m-%d %H:%M:%S'
handlers:
console:
class: logging.StreamHandler
formatter: simpleFormater
level: DEBUG
root:
level: DEBUG
handlers: [console]

And the client command is this:
python kaldigstserver/client.py -r 32000 c2a.wav
where sample wave file is sampled at 16KHz.

Umar17 · 2020-06-15T08:19:43Z

I have tweaked frame-subsampling-factor and ironically it is not putting any effect on latency

alumae · 2020-06-15T08:50:26Z

Can you give some numbers -- the actual difference in decoding time that you are seeing?

I assume you understand that -r 32000 option in client.py means that the audio is sent to the server using this byte rate. If the wav is indeed using 16 kHz 16-bit encoding, then the decoding cannot be completed faster than realtime, as the audio is sent to the server using a rate that simulates realtime recording from the mic.

Umar17 · 2020-06-15T09:47:36Z

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343

Yes, I understand the byte rate and I experimented with -r 256000 as well which should send the whole audio within first second (the intuition is to imitate client for online2-tcp-nnet3-decode-faster that feeds whole audio and half-shutdown socket connection). It doesn't effect accuracy and improves efficiency a bit.

alumae · 2020-06-15T12:40:27Z

Try changing to traceback-period-in-secs: 0.25.

Umar17 · 2020-06-17T10:57:24Z

Tried but no effect. However, average of multiple experiments gives a difference of ~1 second in latency with r -256000 and tcp decoder.
I think the latency increases in gstreamer case due to server-worker-decoder architecture and communication goes slow than in case of online2-tcp-nnet3-decode-faster server.
If it is so, this issue can be closed.

alumae closed this as completed Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

Umar17 commented Jun 14, 2020

alumae commented Jun 15, 2020 •

edited

Loading

Umar17 commented Jun 15, 2020 •

edited

Loading

Umar17 commented Jun 15, 2020

alumae commented Jun 15, 2020 •

edited

Loading

Umar17 commented Jun 15, 2020

alumae commented Jun 15, 2020

Umar17 commented Jun 17, 2020

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

gstkaldinnet2onlinedecoder vs online2-tcp-nnet3-decoder-faster #241

Comments

Umar17 commented Jun 14, 2020

alumae commented Jun 15, 2020 • edited Loading

Umar17 commented Jun 15, 2020 • edited Loading

Umar17 commented Jun 15, 2020

alumae commented Jun 15, 2020 • edited Loading

Umar17 commented Jun 15, 2020

Numbers (in milliseconds) Audio length: 4923 Latency (with -r 32000): 5801 Latency (with -r 256000): 2965 Latency (online2-tcp-nnet3-decode-faster): 1343

alumae commented Jun 15, 2020

Umar17 commented Jun 17, 2020

alumae commented Jun 15, 2020 •

edited

Loading

Umar17 commented Jun 15, 2020 •

edited

Loading

alumae commented Jun 15, 2020 •

edited

Loading

Numbers (in milliseconds)
Audio length: 4923
Latency (with -r 32000): 5801
Latency (with -r 256000): 2965
Latency (online2-tcp-nnet3-decode-faster): 1343