Add audio data provider and a simplified DeepSpeech2 model configuration. #55

xinghai-sun · 2017-05-24T17:34:09Z

…2 model configuration. Bug exists when run training.

xinghai-sun · 2017-05-26T10:25:28Z

deep_speech_2/train.py

+            feeding=feeding)
+        args.num_passes -= 1
+    # other passes without sortagrad
+    trainer.train(


If args.use_sortagrad is true, trainer.train will be called twice. However, the second trainer.train call will get stuck (no progress, no error). trainer.train does not support multiple function calls ?

I think more elaborate controlling interfaces of training process need to be exposed to users for more flexible training flow control. E.g. in this DS2 case, training data needs to be changed during training. In other cases, parts of model needs freezing for a while or being trained alternatively (e.g. GAN).

lcy-seso

almost LGTM.

lcy-seso · 2017-05-30T03:04:02Z

deep_speech_2/model.py

+
+    forward = paddle.layer.recurrent_group(
+        step=__simple_rnn_step__, input=input)
+    return forward


This has been fixed. Need to be updated.

lcy-seso · 2017-05-30T03:09:42Z

deep_speech_2/train.py

+        if isinstance(event, paddle.event.EndPass):
+            result = trainer.test(reader=test_batch_reader, feeding=feeding)
+            print "Pass: %d, TestCost: %s" % (event.pass_id, result.cost)
+            with gzip.open("params.tar.gz", 'w') as f:


Save the trained model according to the pass number, otherwise, the later saved model overwrites the former ones.

Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

lcy-seso · 2017-05-30T03:21:07Z

deep_speech_2/model.py

+        size=dict_size + 1,
+        blank=dict_size,
+        norm_by_times=True)
+    # max decoder


If max_id is not needed in training, I think it should be put into a testing branch.

Will refactor this part later, together with beam search decoder.

lcy-seso · 2017-05-30T03:26:49Z

deep_speech_2/infer.py

+        rnn_size=args.rnn_layer_size)
+
+    # load parameters
+    parameters = paddle.parameters.Parameters.from_tar(


Save / Load models according to the pass index.

Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

qingqing01 · 2017-06-01T07:06:43Z

deep_speech_2/librispeech.py

+
+URL_TEST = "http://www.openslr.org/resources/12/test-clean.tar.gz"
+URL_DEV = "http://www.openslr.org/resources/12/dev-clean.tar.gz"
+URL_TRAIN = "http://www.openslr.org/resources/12/train-clean-100.tar.gz"


Add MD5 to check data.

qingqing01 · 2017-06-01T07:09:47Z

deep_speech_2/audio_data_utils.py

+
+class DataGenerator(object):
+    """
+    DataGenerator provides basic audio data preprocessing pipeline, and offer


offer -> offers

qingqing01 · 2017-06-01T07:26:55Z

deep_speech_2/librispeech.py

+    return target_dir
+
+
+def create_manifest(data_dir, manifest_path):


What manifest mean? Add some comments.

qingqing01 · 2017-06-01T07:49:59Z

deep_speech_2/model.py

+    fc = paddle.layer.fc(
+        input=rnn_group_output,
+        size=dict_size + 1,
+        act=paddle.activation.Linear(),


Note, the activation should be softmax in inference mode.

Current code only contains the Best Path Decoder, which do not require a softmax activation.

qingqing01 · 2017-06-01T07:54:33Z

deep_speech_2/requirements.sh

+pip install soundfile
+
+# For Ubuntu only
+apt-get install libsndfile1


Need to explain in the document.

python部分可以改成提供一个 requires.txt

pkuyym

Almost LGTM.

pkuyym · 2017-06-02T08:51:03Z

deep_speech_2/audio_data_utils.py

+                 stride_ms=10.0,
+                 window_ms=20.0,
+                 max_frequency=None):
+        self.__max_duration__ = max_duration


Python里面最好不要使用__XX__来定义自己的函数和变量，因为__init__、__del__等是python内置的命名方式，下同。如果定义私有的函数或变量只在前面加下划线就可以，印象中__XX是私有的，子类无法访问，_XX也是私有的但是子类可访问

目前参考paddle相关python规范如此，建议和大家讨论之后再改？

pkuyym · 2017-06-02T08:54:16Z

deep_speech_2/audio_data_utils.py

+        self.__stride_ms__ = stride_ms
+        self.__window_ms__ = window_ms
+        self.__max_frequency__ = max_frequency
+        self.__random__ = random.Random(RANDOM_SEED)


是否考虑开放接口给用户，让其指定随机种子？

RANDOM_SEED 不重要，不建议开放。

pkuyym · 2017-06-02T09:52:38Z

deep_speech_2/model.py

+        norm_by_times=True)
+    # max decoder
+    max_id = paddle.layer.max_id(input=fc)
+    return cost, max_id


可以根据训练或者预测分别返回cost或max_id，如果是预测的话，cost应该是非必须的吧

Will refactor this part later when merging with beam search decoder.

pkuyym · 2017-06-02T09:53:17Z

deep_speech_2/requirements.sh

+pip install soundfile
+
+# For Ubuntu only
+apt-get install libsndfile1


python部分可以改成提供一个 requires.txt

kuke

almost LGTM

kuke · 2017-06-02T10:52:14Z

deep_speech_2/train.py

+    "--use_gpu", default=True, type=bool, help="Use gpu or not.")
+parser.add_argument(
+    "--use_sortagrad", default=False, type=bool, help="Use sortagrad or not.")
+parser.add_argument(


第20行已经定义了"--trainer"，help信息相同，是否重复定义？

Yes. Removed.

xinghai-sun

Done. Thanks for the review!

xinghai-sun · 2017-06-02T08:19:50Z

deep_speech_2/infer.py

+        rnn_size=args.rnn_layer_size)
+
+    # load parameters
+    parameters = paddle.parameters.Parameters.from_tar(


Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

xinghai-sun · 2017-06-02T08:52:24Z

deep_speech_2/librispeech.py

+    return target_dir
+
+
+def create_manifest(data_dir, manifest_path):


xinghai-sun · 2017-06-02T08:52:32Z

deep_speech_2/librispeech.py

+
+URL_TEST = "http://www.openslr.org/resources/12/test-clean.tar.gz"
+URL_DEV = "http://www.openslr.org/resources/12/dev-clean.tar.gz"
+URL_TRAIN = "http://www.openslr.org/resources/12/train-clean-100.tar.gz"


xinghai-sun · 2017-06-02T08:52:50Z

deep_speech_2/model.py

+
+    forward = paddle.layer.recurrent_group(
+        step=__simple_rnn_step__, input=input)
+    return forward


xinghai-sun · 2017-06-02T08:53:52Z

deep_speech_2/model.py

+    fc = paddle.layer.fc(
+        input=rnn_group_output,
+        size=dict_size + 1,
+        act=paddle.activation.Linear(),


Current code only contains the Best Path Decoder, which do not require a softmax activation.

xinghai-sun · 2017-06-02T08:58:09Z

deep_speech_2/model.py

+        size=dict_size + 1,
+        blank=dict_size,
+        norm_by_times=True)
+    # max decoder


Will refactor this part later, together with beam search decoder.

xinghai-sun · 2017-06-02T12:50:37Z

deep_speech_2/requirements.sh

+pip install soundfile
+
+# For Ubuntu only
+apt-get install libsndfile1


xinghai-sun · 2017-06-02T12:51:12Z

deep_speech_2/train.py

+        if isinstance(event, paddle.event.EndPass):
+            result = trainer.test(reader=test_batch_reader, feeding=feeding)
+            print "Pass: %d, TestCost: %s" % (event.pass_id, result.cost)
+            with gzip.open("params.tar.gz", 'w') as f:


Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

xinghai-sun

Thanks for the review.

xinghai-sun · 2017-06-02T12:57:05Z

deep_speech_2/audio_data_utils.py

+
+class DataGenerator(object):
+    """
+    DataGenerator provides basic audio data preprocessing pipeline, and offer


xinghai-sun · 2017-06-02T12:58:30Z

deep_speech_2/audio_data_utils.py

+                 stride_ms=10.0,
+                 window_ms=20.0,
+                 max_frequency=None):
+        self.__max_duration__ = max_duration


目前参考paddle相关python规范如此，建议和大家讨论之后再改？

xinghai-sun · 2017-06-02T12:58:55Z

deep_speech_2/audio_data_utils.py

+        self.__stride_ms__ = stride_ms
+        self.__window_ms__ = window_ms
+        self.__max_frequency__ = max_frequency
+        self.__random__ = random.Random(RANDOM_SEED)


RANDOM_SEED 不重要，不建议开放。

xinghai-sun · 2017-06-02T13:00:01Z

deep_speech_2/model.py

+        norm_by_times=True)
+    # max decoder
+    max_id = paddle.layer.max_id(input=fc)
+    return cost, max_id


Will refactor this part later when merging with beam search decoder.

2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details.

xinghai-sun · 2017-06-02T13:30:52Z

Now the model can run smoothly, with a good convergence and reasonable decoding results.

lcy-seso

LGTM

Add deep_speech_2 folder.

2397a30

This was referenced May 24, 2017

Add audio data provider and preprocessor for speech recognition datasets. PaddlePaddle/Paddle#2226

Closed

Add simplified model configuration for DeepSpeech2. PaddlePaddle/Paddle#2231

Closed

Add librispeech dataset, audio data provider and simplfied DeepSpeech…

9ae22c3

…2 model configuration. Bug exists when run training.

xinghai-sun force-pushed the ds2 branch from 504d3d0 to 9ae22c3 Compare May 25, 2017 02:22

xinghai-sun added 3 commits May 25, 2017 22:11

Add infererence and add SortaGrad for only first pass.

47b706c

Add function docs.

7739b52

Update some parameters and comments.

f33f742

xinghai-sun requested review from kuke, pkuyym, lcy-seso, luotao1, qingqing01 and chrisxu2016 May 26, 2017 10:08

xinghai-sun commented May 26, 2017

View reviewed changes

lcy-seso requested changes May 30, 2017

View reviewed changes

Refactor data utils into a class and add feature normalization.

f6d820e

qingqing01 reviewed Jun 1, 2017

View reviewed changes

pkuyym reviewed Jun 2, 2017

View reviewed changes

kuke reviewed Jun 2, 2017

View reviewed changes

xinghai-sun commented Jun 2, 2017

View reviewed changes

1. Fix incorrect decoder result printing.

5de8e43

2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details.

lcy-seso approved these changes Jun 2, 2017

View reviewed changes

lcy-seso merged commit ec9cce9 into PaddlePaddle:develop Jun 2, 2017

xinghai-sun deleted the ds2 branch June 3, 2017 06:41

		return target_dir


		def create_manifest(data_dir, manifest_path):

Add audio data provider and a simplified DeepSpeech2 model configuration. #55

Add audio data provider and a simplified DeepSpeech2 model configuration. #55

Conversation

xinghai-sun commented May 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkuyym left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun commented Jun 2, 2017 • edited Loading

lcy-seso left a comment

Choose a reason for hiding this comment

xinghai-sun commented Jun 2, 2017 •

edited

Loading