New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss value and decode library? #30
Comments
Please have a look at https://github.com/k2-fsa/icefall You can find tensorboard training logs in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md
The average loss per frame is about 0.02 or below.
Yes, please see modified beam search in https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/transducer_stateless/beam_search.py#L363 There is only one loop in the time axis. We have documentation for how to use it with a pre-trained model. Please see https://icefall.readthedocs.io/en/latest/recipes/aishell/stateless_transducer.html There is also a Colab notebook for it |
Note: The above beam search is implemented in Python and it decodes only one utterance at one time. We are implementing it in C++ with CUDA, which can decode multiple utterances in parallel. It will be wrapped to Python soon. |
wow, you answer is really helpful, thank you very much |
Dear csukuangfj!
the speed is much better, and thanks for your work. I'm here to ask is there any documentation about the c++ decode interface (k2-fsa/k2#926) you mentioned before ? |
If you try the k2 pruned rnnt loss, https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless/model.py#L160 There is a Python interface for it. See k2-fsa/icefall#250 We will add C++ interface for it later, i.e., provide only a header file and some pre-compiled libraries. k2-fsa/icefall#250 is even faster if you use it for decoding. |
second: I have modified the modified decode method to support batchly decoding , so the performance in decode speed are as following:
very thanks for you information, and I will try to use the interface k2-fsa/icefall#250 you mentioned some time later. |
Please clarify whether the loss is
By the way, how do you measure the decoding time? Do you have any RTF available? |
The loss code as following : so I guess, the loss is the sum of the loss over all frames in batch. Decoding time : I'm trying to use in a batch way, so RTF is not available in this condition, my measure is very simple: how much time does it take to complete a inference of a batch data. and I found that the decoding is the bottleneck, as it takes about 99% time. |
Yes, you can divide it by the number of acoustic frames after subsampling in the model. Please see info["frames"] = (feature_lens // params.subsampling_factor).sum().item() |
ok |
thanks very much for your great project!
I have two questions to ask:
1. how big is the the transducer loss for a well performed model? or the model is converged?
2. is there any fast decode solution? I found the decode module in many project implementing the beam search decode algorithm is extremely slow
The text was updated successfully, but these errors were encountered: