Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning speedups beam search #115

Closed
kuke opened this issue Jun 21, 2017 · 1 comment
Closed

Pruning speedups beam search #115

kuke opened this issue Jun 21, 2017 · 1 comment

Comments

@kuke
Copy link
Collaborator

kuke commented Jun 21, 2017

The ctc beam search in DS2, or prefix beam search consists of appending candidate characters to prefixes and repeatedly looking up the n-gram language model. Both the two processes are time-consuming, and bring difficulty in parameters tuning and deployment.

A proven effective way is to prune the beam search. Specifically, in extending prefix only the fewest number of characters whose cumulative probability is at least p need to be considered, instead of all the characters in the vocabulary. When p is taken 0.99 as recommended by the DS2 paper, about 20x speedup is yielded in English transcription than that without pruning, with very little loss in accuracy. And for the Mandarin, the speedup ratio is reported to be up to 150x.

Due to pruning, the tuning of parameters gets more efficiently. There are two important parameters alpha and beta in beam search, associated with language model and word insertion respectively. With a more acceptable speed, alpha and beta can be searched elaborately. And the relation between WER and the two parameters turns out to be:
figure_1

With the optimal parameters alpha=0.26 and beta=0.1 as shown in above figure, currently the beam search decoder has decreased WER to 13% from the best path decoding accuracy 22%.

@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants