add ctc beam search decoder #59

kuke · 2017-06-01T02:21:57Z

In progress. Add pseudo code and test information later.

kuke · 2017-06-02T03:05:21Z

The algorithm in prefix beam search paper is found to be very confusing and may have some problem in details. So here is a modification, which the code is based on

luotao1 · 2017-06-02T03:16:34Z

ctc_beam_search_decoder/test_ctc_beam_search_decoder.py

+    inputs_t = [ops.convert_to_tensor(x) for x in inputs]
+    inputs_t = array_ops.stack(inputs_t)
+
+    # run CTC beam search decoder in tensorflow


请问为什么单测用tensorflow来写呢？

只是为了和TensorFlow对比结果

kuke · 2017-06-02T04:07:46Z

Validate the implementation

To affirm the correctness, the implementation is compared with the ctc_beam_search_decoder in TensorFlow under the same input probability matrix and beam size. An independent repo is provided to test the logic. Run the script test_ctc_beam_search_decoder.py, we can see that the two decoders have the same decoding output. Note that the log probability of TensorFlow decoder has been added a bias by somehow, and it doesn't affect the order.

{tf_decoder log probs} 	 {tested_decoder log probs}:  {tf_decoder result}  {tested_decoder result}
2.734900	-3.168560:  [0 1 0]    [0, 1, 0]
2.541887	-3.361573:  [1 0]    [1, 0]
2.442779	-3.460681:  [1 0 1]    [1, 0, 1]
2.241655	-3.661805:  [0 1 0 1]    [0, 1, 0, 1]
1.963587	-3.939873:  [0 1 0 3]    [0, 1, 0, 3]
1.949989	-3.953471:  [1 0 1 0]    [1, 0, 1, 0]
1.948595	-3.954865:  [0 1 0 2]    [0, 1, 0, 2]
1.927994	-3.975465:  [0 1 0 4]    [0, 1, 0, 4]
1.870628	-4.032832:  [1 0 3]    [1, 0, 3]
1.855636	-4.047824:  [1 0 2]    [1, 0, 2]
1.835035	-4.068425:  [1 0 4]    [1, 0, 4]
1.783490	-4.119970:  [0 1]    [0, 1]
1.733639	-4.169821:  [1 1 0]    [1, 1, 0]
1.425048	-4.478412:  [1 1]    [1, 1]
1.260652	-4.642808:  [0 1 3]    [0, 1, 3]
1.245660	-4.657800:  [0 1 2]    [0, 1, 2]
1.225059	-4.678401:  [0 1 4]    [0, 1, 4]
1.164931	-4.738529:  [1 0 1 3]    [1, 0, 1, 3]
1.149939	-4.753521:  [1 0 1 2]    [1, 0, 1, 2]
1.129338	-4.774122:  [1 0 1 4]    [1, 0, 1, 4]

More validation can be done by setting different max_time_steps (<=6) and beam_size or changing the input probability matrix.

xinghai-sun

Please make the interface of ctc_beam_search_decoder more general to allow any external custom scorer to be used.
Please carefully clean and check the codes before committing.

xinghai-sun · 2017-06-05T06:56:15Z

deep_speech_2/ctc_beam_search_decoder.py

+import random
+import numpy as np
+
+# vocab = blank + space + English characters


Please remove unnecessary comment lines.

xinghai-sun · 2017-06-05T07:02:34Z

deep_speech_2/ctc_beam_search_decoder.py

+    return ids_str
+
+
+def language_model(ids_list, vocabulary):


I think it's a "toy" language model just for testing. Please replace it with a "real" one build in the pull request #71.

xinghai-sun · 2017-06-05T07:07:37Z

deep_speech_2/ctc_beam_search_decoder.py

+                            beam_size,
+                            vocabulary,
+                            max_time_steps=None,
+                            lang_model=language_model,


lang_model --> external_scoring_function.

Please use "language_model" instead of lang_model for clarity.

Not only LM, but also other custom scoring function are also allowed. Please rename it to make this clear.

xinghai-sun · 2017-06-05T07:08:41Z

deep_speech_2/ctc_beam_search_decoder.py

+                            vocabulary,
+                            max_time_steps=None,
+                            lang_model=language_model,
+                            alpha=1.0,


If lang_model --> external_scoring_function, these parameters should be moved to external_scoring_function creator.

xinghai-sun · 2017-06-05T07:09:50Z

deep_speech_2/ctc_beam_search_decoder.py

+                            space_id=1,
+                            num_results_per_sample=None):
+    '''
+    Beam search decoder for CTC-trained network,  adapted from Algorithm 1 


"Adapted" means there is a difference? Could you please explain what the difference is?

xinghai-sun · 2017-06-05T07:42:44Z

deep_speech_2/ctc_beam_search_decoder.py

+vocab = ['-', '_', 'a']
+
+
+def ids_list2str(ids_list):


Remove Line 13 - 20. Please clean codes before commits.

xinghai-sun · 2017-06-05T07:45:46Z

deep_speech_2/decoder.py

+               vocabulary,
+               method,
+               beam_size=None,
+               num_results_per_sample=None):
    """
    CTC-like sequence decoding from a sequence of likelihood probablilites. 



Since now we have more than one type of decoders. Please add comments to simply explain each one.

xinghai-sun · 2017-06-05T07:50:48Z

deep_speech_2/test_ctc_beam_search_decoder.py

+import numpy as np
+import tensorflow as tf
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops


It is not proper to include tensor-flow dependency. It would be better to paste ground-truth results and just compare our results with it.

xinghai-sun · 2017-06-05T07:52:34Z

deep_speech_2/test_ctc_beam_search_decoder.py

@@ -0,0 +1,69 @@
+from __future__ import absolute_import


Should we put it in a ./test folder？What is the best practice for a python unit test file?

Removed the test code. Done

xinghai-sun · 2017-06-05T07:53:19Z

deep_speech_2/ctc_beam_search_decoder.py

+## This is a prototype of ctc beam search decoder
+
+import copy
+import random


Not used. Remove it.

pkuyym · 2017-06-07T03:40:26Z

deep_speech_2/decoder.py

@@ -36,25 +38,164 @@ def ctc_best_path_decode(probs_seq, vocabulary):
    return ''.join([vocabulary[index] for index in index_list])


-def ctc_decode(probs_seq, vocabulary, method):
+class Scorer(object):


Maybe we should consider the expandability. KenLM is only one of the language model tools and each tool have its special interface. We can define a unify base class, and derivate KenLMScore from the base class.

If more language models are involved, the Scorer will be redesigned. Temporarily we use one class to avoid redundancy.

pkuyym · 2017-06-07T03:43:12Z

deep_speech_2/decoder.py

+        self._beta = beta
+        self._language_model = kenlm.LanguageModel(model_path)
+
+    def language_model_score(self, sentence, bos=True, eos=False):


Special tokens should be replaced by KenLM's internal usage format like end token、unknown token etc. Start token should be removed from the sentence.

The decoded prefix in ctc decoder doesn't contain any special tokens. So the reprocessing is simplified.

pkuyym · 2017-06-07T03:55:18Z

deep_speech_2/decoder.py

-        return ctc_best_path_decode(probs_seq, vocabulary)
-    else:
-        raise ValueError("Decoding method [%s] is not supported.")
+    max_time_steps = len(probs_seq)


Consider replace max_time_steps to other name (like time_step_num) ? Feel confused somehow.

pkuyym · 2017-06-07T05:01:40Z

deep_speech_2/decoder.py

+    ## initialize
+    # the set containing selected prefixes
+    prefix_set_prev = {'-1': 1.0}
+    probs_b, probs_nb = {'-1': 1.0}, {'-1': 0.0}


Consider rename probs_b and probs_nb to probs_b_prev and probs_nb_prev ？

kuke · 2017-06-15T03:24:08Z

Used grid search to find out optimal parameters alpha=0.26, beta=0.1, decreasing WER to ~0.17

…& code cleanup

kuke · 2017-06-20T02:29:18Z

Passed CI. With a rebuilt more powerful language model, the WER has been decreased to 13%. #115

… ctc_decoder_dev

xinghai-sun

Great work!

xinghai-sun · 2017-06-26T06:27:37Z

deep_speech_2/decoder.py

+                            cutoff_prob=1.0,
+                            ext_scoring_func=None,
+                            nproc=False):
+    '''Beam search decoder for CTC-trained network, using beam search with width


Use """ instead of ''' for consistency.
Please also check other places for this.

xinghai-sun · 2017-06-26T06:44:58Z

deep_speech_2/decoder.py

 from itertools import groupby
+import numpy as np
+import multiprocessing


 def ctc_best_path_decode(probs_seq, vocabulary):


ctc_best_path_decode --> ctc_best_path_decoder. Please also modify the function comments' decoding to decoder.

xinghai-sun · 2017-06-26T06:53:07Z

deep_speech_2/decoder.py

+                            ext_scoring_func=None,
+                            nproc=False):
+    '''Beam search decoder for CTC-trained network, using beam search with width
+    beam_size to find many paths to one label, return  beam_size labels in


, using beam search with width find many paths to one label, return beam_size labels in the descending order --> ". It utilizes beam search to approximately select top best decoding paths and returning results in the descending order`

原句不是一个完整的句子，尤其注意标点的使用。

xinghai-sun · 2017-06-26T06:57:10Z

deep_speech_2/decoder.py

+    '''Beam search decoder for CTC-trained network, using beam search with width
+    beam_size to find many paths to one label, return  beam_size labels in
+    the descending order of probabilities. The implementation is based on Prefix
+    Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is


Beam Search( --> Beam Search (

xinghai-sun · 2017-06-26T06:57:50Z

deep_speech_2/decoder.py

+    beam_size to find many paths to one label, return  beam_size labels in
+    the descending order of probabilities. The implementation is based on Prefix
+    Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is
+    redesigned.


What is the redesigned and why? Could you please add detailed explanation?

xinghai-sun · 2017-06-26T08:31:31Z

deep_speech_2/scorer.py

+        return np.power(10, log_cond_prob)
+
+    # word insertion term
+    def word_count(self, sentence):


Do not expose word_count.

xinghai-sun · 2017-06-26T08:32:54Z

deep_speech_2/scorer.py

+        self._language_model = kenlm.LanguageModel(model_path)
+
+    # n-gram language model scoring
+    def language_model_score(self, sentence):


no need to expose this score

xinghai-sun · 2017-06-26T08:33:15Z

deep_speech_2/scorer.py

+        return len(words)
+
+    # execute evaluation
+    def __call__(self, sentence, log=False):


rename to get_score

Preserved because by using __call__ the scorer can be called by scorer_name(prefix) and compatible with a plain function func_name(prefix).

xinghai-sun · 2017-06-26T08:34:33Z

deep_speech_2/scorer.py

+    :param alpha: Parameter associated with language model.
+    :type alpha: float
+    :param beta: Parameter associated with word count.
+    :type beta: float


Explain when word count is not used? e.g. "If beta = xxxx ...."

xinghai-sun · 2017-06-26T08:35:42Z

deep_speech_2/tune.py

+from __future__ import division
+from __future__ import print_function
+
+import paddle.v2 as paddle


Reorder the imports.

Please modify all below according to the suggestions in infer.py and evaluate.py.

Add descriptions to README.md for usage of tune.py and evaluate.py.

… ctc_decoder_dev

kuke

Refined. Please review again.

kuke · 2017-06-27T09:48:30Z

deep_speech_2/decoder.py

+                            blank_id=0,
+                            cutoff_prob=1.0,
+                            ext_scoring_func=None,
+                            nproc=False):


Preserved temporarily before fixing the problem about how to pass ext_scoring_fuc to the multi processes.

kuke · 2017-06-27T09:50:55Z

deep_speech_2/evaluate.py

+from model import deep_speech2
+from decoder import *
+from scorer import Scorer
+from error_rate import wer


kuke · 2017-06-27T09:57:25Z

deep_speech_2/evaluate.py

+    help="Manifest path for normalizer. (default: %(default)s)")
+parser.add_argument(
+    "--decode_manifest_path",
+    default='data/manifest.libri.test-clean',


kuke · 2017-06-27T09:57:55Z

deep_speech_2/scorer.py

+    :param alpha: Parameter associated with language model.
+    :type alpha: float
+    :param beta: Parameter associated with word count.
+    :type beta: float


kuke · 2017-06-27T10:00:28Z

deep_speech_2/decoder.py

 from itertools import groupby
+import numpy as np
+import multiprocessing


 def ctc_best_path_decode(probs_seq, vocabulary):


kuke · 2017-06-27T10:22:59Z

deep_speech_2/scorer.py

+
+
+class Scorer(object):
+    """External defined scorer to evaluate a sentence in beam search


kuke · 2017-06-27T10:23:12Z

deep_speech_2/scorer.py

+        self._language_model = kenlm.LanguageModel(model_path)
+
+    # n-gram language model scoring
+    def language_model_score(self, sentence):


kuke · 2017-06-27T10:23:18Z

deep_speech_2/scorer.py

+        return np.power(10, log_cond_prob)
+
+    # word insertion term
+    def word_count(self, sentence):


kuke · 2017-06-27T10:25:19Z

deep_speech_2/scorer.py

+        return len(words)
+
+    # execute evaluation
+    def __call__(self, sentence, log=False):


Preserved because by using __call__ the scorer can be called by scorer_name(prefix) and compatible with a plain function func_name(prefix).

kuke · 2017-06-27T10:26:51Z

deep_speech_2/tune.py

+from __future__ import division
+from __future__ import print_function
+
+import paddle.v2 as paddle


xinghai-sun

Almost LGTM.

xinghai-sun · 2017-07-04T12:37:48Z

deep_speech_2/decoder.py

+    of probabilities, the assignment operation is changed to accumulation for
+    one prefix may comes from different paths; 2) the if condition "if l^+ not
+    in A_prev then" after probabilities' computation is deprecated for it is
+    hard to understand and seems unnecessary.


Can we make sure that these modifications are correct?

xinghai-sun · 2017-07-04T12:38:46Z

deep_speech_2/decoder.py

+                            blank_id=0,
+                            cutoff_prob=1.0,
+                            ext_scoring_func=None,
+                            nproc=False):


Could we fix it now ?

xinghai-sun · 2017-07-04T12:40:58Z

deep_speech_2/decoder.py

+        '\t': 1.0
+    }, {
+        '\t': 0.0
+    }


No need to use so many lines. Maybe you can revert it back with only two lines.

xinghai-sun · 2017-07-04T12:48:25Z

deep_speech_2/decoder.py

                                  beam_size,
                                  vocabulary,
-                                  blank_id=0,
+                                  blank_id,
+                                  num_processes,


Can we set it to 'multiprocessing.cpu_count()' as default value?

xinghai-sun · 2017-07-04T12:55:01Z

deep_speech_2/evaluate.py

+    help="Manifest path for normalizer. (default: %(default)s)")
+parser.add_argument(
+    "--decode_manifest_path",
+    default='data/manifest.libri.test-clean',


Still 'data/manifest.libri.test-clean' ?

xinghai-sun · 2017-07-04T12:58:28Z

deep_speech_2/infer.py

    type=str,
    help="Manifest path for decoding. (default: %(default)s)")
 parser.add_argument(
    "--model_filepath",
-    default='checkpoints/params.latest.tar.gz',
+    default='checkpoints/params.tar.gz.41',


use latest as default.

xinghai-sun · 2017-07-04T13:02:22Z

deep_speech_2/lm/lm_scorer.py


-    :param alpha: Parameter associated with language model.
+    :param alpha: Parameter associated with language model. Don't use
+                  language model when alpha = 0.


--》 Language-model scorer is disabled when alpha=0.

xinghai-sun · 2017-07-04T13:02:41Z

deep_speech_2/lm/lm_scorer.py

    :type alpha: float
-    :param beta: Parameter associated with word count.
+    :param beta: Parameter associated with word count. Don't use word
+                count when beta = 0.


Word-count scorer is disabled when beta = 0.

xinghai-sun · 2017-07-04T13:05:55Z

deep_speech_2/lm/lm_scorer.py

-        lm = self.language_model_score(sentence)
-        word_cnt = self.word_count(sentence)
+        lm = self._language_model_score(sentence)
+        word_cnt = self._word_count(sentence)
        if log == False:
            score = np.power(lm, self._alpha) \
                    * np.power(word_cnt, self._beta)


Is it possible tp put L60 and L61 into a single line within 80 columns?

xinghai-sun · 2017-07-04T13:06:16Z

deep_speech_2/lm/run.sh

@@ -0,0 +1,3 @@
+echo "Downloading language model."
+
+wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data


Could you replace it with a real url?

kuke force-pushed the ctc_decoder_dev branch from 504b15c to ab2307c Compare June 2, 2017 01:31

luotao1 reviewed Jun 2, 2017

View reviewed changes

kuke requested review from xinghai-sun and lcy-seso June 2, 2017 04:19

kuke force-pushed the ctc_decoder_dev branch from ab2307c to fc48f10 Compare June 4, 2017 11:16

xinghai-sun requested changes Jun 5, 2017

View reviewed changes

pkuyym requested changes Jun 7, 2017

View reviewed changes

kuke force-pushed the ctc_decoder_dev branch from 588116b to fb979b6 Compare June 7, 2017 09:00

Yibing Liu added 12 commits June 12, 2017 17:03

update code & add test

a51ed51

mv ctc_beam_search_decoder into deep_speech_2/

b930d69

update annotations

f382528

enable ctc beam search decoder

1e9ae32

code clean & add external scorer

142a79f

add annotations

1d7ade1

modify language model scoring

6698fe6

rename variables in decoder

3a7a52e

tiny modify to pass CI

2922a45

improve external scorer

ed23e21

optimize the efficiency of beam search

bcd01f7

add beam search decoder using multiprocesses

0deb2e6

kuke force-pushed the ctc_decoder_dev branch from 2faeb2b to 0deb2e6 Compare June 12, 2017 09:15

Yibing Liu added 3 commits June 12, 2017 17:20

correct typos in annotations

ef1350f

enable lm in multiprocessing decoder & add script for params tuning

4b8fe7e

change two arguments

0fa063e

kuke force-pushed the ctc_decoder_dev branch from d8f7372 to ac8c115 Compare June 18, 2017 06:34

final refining on old data provider: enable pruning & add evaluation …

08203ee

…& code cleanup

kuke force-pushed the ctc_decoder_dev branch from ac8c115 to 08203ee Compare June 18, 2017 08:37

Yibing Liu added 2 commits June 18, 2017 18:11

add scoring last word in beam search

c02c650

adapt to the new data provider

c8b0ae8

kuke force-pushed the ctc_decoder_dev branch 2 times, most recently from 5c4751e to 3d292d0 Compare June 20, 2017 02:22

Yibing Liu added 2 commits June 20, 2017 12:03

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

c26dab9

… ctc_decoder_dev

tiny adjust

04881fa

kuke force-pushed the ctc_decoder_dev branch from 3d292d0 to 04881fa Compare June 20, 2017 04:16

Yibing Liu added 2 commits June 20, 2017 18:21

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

40b75e3

… ctc_decoder_dev

add unit test for decoders

46df7c4

xinghai-sun requested changes Jun 26, 2017

View reviewed changes

Yibing Liu added 3 commits June 26, 2017 17:48

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

6224d36

… ctc_decoder_dev

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

accaf92

… ctc_decoder_dev

refine ctc_beam_search_decoder

63a72c1

kuke commented Jun 27, 2017

View reviewed changes

Yibing Liu added 5 commits June 27, 2017 18:35

fix decoders' unittest

359dc2a

append README.md

20b50ca

resolve conflicts in requirements.txt

2b594b4

Merge branch 'develop' into ctc_decoder_dev

66bb901

enable resetting params in scorer

eec7cb4

xinghai-sun requested changes Jul 4, 2017

View reviewed changes

follow comments in code format

d7f5ee6

xinghai-sun approved these changes Jul 5, 2017

View reviewed changes

pkuyym approved these changes Jul 5, 2017

View reviewed changes

xinghai-sun merged commit de86560 into PaddlePaddle:develop Jul 5, 2017



		class Scorer(object):
		"""External defined scorer to evaluate a sentence in beam search

		@@ -0,0 +1,3 @@
		echo "Downloading language model."

		wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data

add ctc beam search decoder #59

add ctc beam search decoder #59

Conversation

kuke commented Jun 1, 2017

kuke commented Jun 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke commented Jun 2, 2017 • edited Loading

xinghai-sun left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke Jun 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkuyym Jun 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke commented Jun 15, 2017

kuke commented Jun 20, 2017 • edited Loading

xinghai-sun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke commented Jun 2, 2017 •

edited

Loading

kuke commented Jun 2, 2017 •

edited

Loading

xinghai-sun left a comment •

edited

Loading

kuke Jun 7, 2017 •

edited

Loading

pkuyym Jun 7, 2017 •

edited

Loading

kuke commented Jun 20, 2017 •

edited

Loading