Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ctc beam search decoder #59

Merged
merged 31 commits into from
Jul 5, 2017
Merged

Conversation

kuke
Copy link
Collaborator

@kuke kuke commented Jun 1, 2017

resolve PaddlePaddle/Paddle#2230

In progress. Add pseudo code and test information later.

@kuke
Copy link
Collaborator Author

kuke commented Jun 2, 2017

The algorithm in prefix beam search paper is found to be very confusing and may have some problem in details. So here is a modification, which the code is based on
algorithm 1

inputs_t = [ops.convert_to_tensor(x) for x in inputs]
inputs_t = array_ops.stack(inputs_t)

# run CTC beam search decoder in tensorflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问为什么单测用tensorflow来写呢?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只是为了和TensorFlow对比结果

@kuke
Copy link
Collaborator Author

kuke commented Jun 2, 2017

Validate the implementation

To affirm the correctness, the implementation is compared with the ctc_beam_search_decoder in TensorFlow under the same input probability matrix and beam size. An independent repo is provided to test the logic. Run the script test_ctc_beam_search_decoder.py, we can see that the two decoders have the same decoding output. Note that the log probability of TensorFlow decoder has been added a bias by somehow, and it doesn't affect the order.

{tf_decoder log probs} 	 {tested_decoder log probs}:  {tf_decoder result}  {tested_decoder result}
2.734900	-3.168560:  [0 1 0]    [0, 1, 0]
2.541887	-3.361573:  [1 0]    [1, 0]
2.442779	-3.460681:  [1 0 1]    [1, 0, 1]
2.241655	-3.661805:  [0 1 0 1]    [0, 1, 0, 1]
1.963587	-3.939873:  [0 1 0 3]    [0, 1, 0, 3]
1.949989	-3.953471:  [1 0 1 0]    [1, 0, 1, 0]
1.948595	-3.954865:  [0 1 0 2]    [0, 1, 0, 2]
1.927994	-3.975465:  [0 1 0 4]    [0, 1, 0, 4]
1.870628	-4.032832:  [1 0 3]    [1, 0, 3]
1.855636	-4.047824:  [1 0 2]    [1, 0, 2]
1.835035	-4.068425:  [1 0 4]    [1, 0, 4]
1.783490	-4.119970:  [0 1]    [0, 1]
1.733639	-4.169821:  [1 1 0]    [1, 1, 0]
1.425048	-4.478412:  [1 1]    [1, 1]
1.260652	-4.642808:  [0 1 3]    [0, 1, 3]
1.245660	-4.657800:  [0 1 2]    [0, 1, 2]
1.225059	-4.678401:  [0 1 4]    [0, 1, 4]
1.164931	-4.738529:  [1 0 1 3]    [1, 0, 1, 3]
1.149939	-4.753521:  [1 0 1 2]    [1, 0, 1, 2]
1.129338	-4.774122:  [1 0 1 4]    [1, 0, 1, 4]

More validation can be done by setting different max_time_steps (<=6) and beam_size or changing the input probability matrix.

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please make the interface of ctc_beam_search_decoder more general to allow any external custom scorer to be used.
  2. Please carefully clean and check the codes before committing.

import random
import numpy as np

# vocab = blank + space + English characters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary comment lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return ids_str


def language_model(ids_list, vocabulary):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a "toy" language model just for testing. Please replace it with a "real" one build in the pull request #71.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

beam_size,
vocabulary,
max_time_steps=None,
lang_model=language_model,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lang_model --> external_scoring_function.

  1. Please use "language_model" instead of lang_model for clarity.
  2. Not only LM, but also other custom scoring function are also allowed. Please rename it to make this clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

vocabulary,
max_time_steps=None,
lang_model=language_model,
alpha=1.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lang_model --> external_scoring_function, these parameters should be moved to external_scoring_function creator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

space_id=1,
num_results_per_sample=None):
'''
Beam search decoder for CTC-trained network, adapted from Algorithm 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Adapted" means there is a difference? Could you please explain what the difference is?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

vocab = ['-', '_', 'a']


def ids_list2str(ids_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Line 13 - 20. Please clean codes before commits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

vocabulary,
method,
beam_size=None,
num_results_per_sample=None):
"""
CTC-like sequence decoding from a sequence of likelihood probablilites.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now we have more than one type of decoders. Please add comments to simply explain each one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not proper to include tensor-flow dependency. It would be better to paste ground-truth results and just compare our results with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,69 @@
from __future__ import absolute_import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put it in a ./test folder?What is the best practice for a python unit test file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the test code. Done

## This is a prototype of ctc beam search decoder

import copy
import random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used. Remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -36,25 +38,164 @@ def ctc_best_path_decode(probs_seq, vocabulary):
return ''.join([vocabulary[index] for index in index_list])


def ctc_decode(probs_seq, vocabulary, method):
class Scorer(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider the expandability. KenLM is only one of the language model tools and each tool have its special interface. We can define a unify base class, and derivate KenLMScore from the base class.

Copy link
Collaborator Author

@kuke kuke Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If more language models are involved, the Scorer will be redesigned. Temporarily we use one class to avoid redundancy.

self._beta = beta
self._language_model = kenlm.LanguageModel(model_path)

def language_model_score(self, sentence, bos=True, eos=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special tokens should be replaced by KenLM's internal usage format like end token、unknown token etc. Start token should be removed from the sentence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decoded prefix in ctc decoder doesn't contain any special tokens. So the reprocessing is simplified.

return ctc_best_path_decode(probs_seq, vocabulary)
else:
raise ValueError("Decoding method [%s] is not supported.")
max_time_steps = len(probs_seq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replace max_time_steps to other name (like time_step_num) ? Feel confused somehow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

## initialize
# the set containing selected prefixes
prefix_set_prev = {'-1': 1.0}
probs_b, probs_nb = {'-1': 1.0}, {'-1': 0.0}
Copy link
Contributor

@pkuyym pkuyym Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rename probs_b and probs_nb to probs_b_prev and probs_nb_prev ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kuke
Copy link
Collaborator Author

kuke commented Jun 15, 2017

Used grid search to find out optimal parameters alpha=0.26, beta=0.1, decreasing WER to ~0.17

@kuke kuke force-pushed the ctc_decoder_dev branch 2 times, most recently from 5c4751e to 3d292d0 Compare June 20, 2017 02:22
@kuke
Copy link
Collaborator Author

kuke commented Jun 20, 2017

Passed CI. With a rebuilt more powerful language model, the WER has been decreased to 13%. #115

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

cutoff_prob=1.0,
ext_scoring_func=None,
nproc=False):
'''Beam search decoder for CTC-trained network, using beam search with width
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use """ instead of ''' for consistency.
Please also check other places for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from itertools import groupby
import numpy as np
import multiprocessing


def ctc_best_path_decode(probs_seq, vocabulary):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctc_best_path_decode --> ctc_best_path_decoder. Please also modify the function comments' decoding to decoder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ext_scoring_func=None,
nproc=False):
'''Beam search decoder for CTC-trained network, using beam search with width
beam_size to find many paths to one label, return beam_size labels in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, using beam search with width find many paths to one label, return beam_size labels in the descending order --> ". It utilizes beam search to approximately select top best decoding paths and returning results in the descending order`

原句不是一个完整的句子,尤其注意标点的使用。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'''Beam search decoder for CTC-trained network, using beam search with width
beam_size to find many paths to one label, return beam_size labels in
the descending order of probabilities. The implementation is based on Prefix
Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beam Search( --> Beam Search (

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

beam_size to find many paths to one label, return beam_size labels in
the descending order of probabilities. The implementation is based on Prefix
Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is
redesigned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the redesigned and why? Could you please add detailed explanation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return np.power(10, log_cond_prob)

# word insertion term
def word_count(self, sentence):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not expose word_count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self._language_model = kenlm.LanguageModel(model_path)

# n-gram language model scoring
def language_model_score(self, sentence):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to expose this score

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return len(words)

# execute evaluation
def __call__(self, sentence, log=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to get_score

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserved because by using __call__ the scorer can be called by scorer_name(prefix) and compatible with a plain function func_name(prefix).

:param alpha: Parameter associated with language model.
:type alpha: float
:param beta: Parameter associated with word count.
:type beta: float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain when word count is not used? e.g. "If beta = xxxx ...."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from __future__ import division
from __future__ import print_function

import paddle.v2 as paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Reorder the imports.
  2. Please modify all below according to the suggestions in infer.py and evaluate.py.
  3. Add descriptions to README.md for usage of tune.py and evaluate.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator Author

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refined. Please review again.

blank_id=0,
cutoff_prob=1.0,
ext_scoring_func=None,
nproc=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserved temporarily before fixing the problem about how to pass ext_scoring_fuc to the multi processes.

from model import deep_speech2
from decoder import *
from scorer import Scorer
from error_rate import wer
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

help="Manifest path for normalizer. (default: %(default)s)")
parser.add_argument(
"--decode_manifest_path",
default='data/manifest.libri.test-clean',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:param alpha: Parameter associated with language model.
:type alpha: float
:param beta: Parameter associated with word count.
:type beta: float
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from itertools import groupby
import numpy as np
import multiprocessing


def ctc_best_path_decode(probs_seq, vocabulary):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



class Scorer(object):
"""External defined scorer to evaluate a sentence in beam search
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self._language_model = kenlm.LanguageModel(model_path)

# n-gram language model scoring
def language_model_score(self, sentence):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return np.power(10, log_cond_prob)

# word insertion term
def word_count(self, sentence):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return len(words)

# execute evaluation
def __call__(self, sentence, log=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserved because by using __call__ the scorer can be called by scorer_name(prefix) and compatible with a plain function func_name(prefix).

from __future__ import division
from __future__ import print_function

import paddle.v2 as paddle
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM.

of probabilities, the assignment operation is changed to accumulation for
one prefix may comes from different paths; 2) the if condition "if l^+ not
in A_prev then" after probabilities' computation is deprecated for it is
hard to understand and seems unnecessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure that these modifications are correct?

blank_id=0,
cutoff_prob=1.0,
ext_scoring_func=None,
nproc=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fix it now ?

'\t': 1.0
}, {
'\t': 0.0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use so many lines. Maybe you can revert it back with only two lines.

beam_size,
vocabulary,
blank_id=0,
blank_id,
num_processes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set it to 'multiprocessing.cpu_count()' as default value?

help="Manifest path for normalizer. (default: %(default)s)")
parser.add_argument(
"--decode_manifest_path",
default='data/manifest.libri.test-clean',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still 'data/manifest.libri.test-clean' ?

type=str,
help="Manifest path for decoding. (default: %(default)s)")
parser.add_argument(
"--model_filepath",
default='checkpoints/params.latest.tar.gz',
default='checkpoints/params.tar.gz.41',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use latest as default.


:param alpha: Parameter associated with language model.
:param alpha: Parameter associated with language model. Don't use
language model when alpha = 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--》 Language-model scorer is disabled when alpha=0.

:type alpha: float
:param beta: Parameter associated with word count.
:param beta: Parameter associated with word count. Don't use word
count when beta = 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word-count scorer is disabled when beta = 0.

lm = self.language_model_score(sentence)
word_cnt = self.word_count(sentence)
lm = self._language_model_score(sentence)
word_cnt = self._word_count(sentence)
if log == False:
score = np.power(lm, self._alpha) \
* np.power(word_cnt, self._beta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible tp put L60 and L61 into a single line within 80 columns?

@@ -0,0 +1,3 @@
echo "Downloading language model."

wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace it with a real url?

@xinghai-sun xinghai-sun merged commit de86560 into PaddlePaddle:develop Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CTC-LM-beam-search decoder.
4 participants