Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add audio data provider and a simplified DeepSpeech2 model configuration. #55

Merged
merged 7 commits into from
Jun 2, 2017

Conversation

xinghai-sun
Copy link
Contributor

resolved issue 2226
resolved issue 2231

…2 model configuration.

Bug exists when run training.
feeding=feeding)
args.num_passes -= 1
# other passes without sortagrad
trainer.train(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If args.use_sortagrad is true, trainer.train will be called twice. However, the second trainer.train call will get stuck (no progress, no error). trainer.train does not support multiple function calls ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more elaborate controlling interfaces of training process need to be exposed to users for more flexible training flow control. E.g. in this DS2 case, training data needs to be changed during training. In other cases, parts of model needs freezing for a while or being trained alternatively (e.g. GAN).

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost LGTM.


forward = paddle.layer.recurrent_group(
step=__simple_rnn_step__, input=input)
return forward
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed. Need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if isinstance(event, paddle.event.EndPass):
result = trainer.test(reader=test_batch_reader, feeding=feeding)
print "Pass: %d, TestCost: %s" % (event.pass_id, result.cost)
with gzip.open("params.tar.gz", 'w') as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save the trained model according to the pass number, otherwise, the later saved model overwrites the former ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

size=dict_size + 1,
blank=dict_size,
norm_by_times=True)
# max decoder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If max_id is not needed in training, I think it should be put into a testing branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor this part later, together with beam search decoder.

rnn_size=args.rnn_layer_size)

# load parameters
parameters = paddle.parameters.Parameters.from_tar(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save / Load models according to the pass index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.


URL_TEST = "http://www.openslr.org/resources/12/test-clean.tar.gz"
URL_DEV = "http://www.openslr.org/resources/12/dev-clean.tar.gz"
URL_TRAIN = "http://www.openslr.org/resources/12/train-clean-100.tar.gz"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add MD5 to check data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


class DataGenerator(object):
"""
DataGenerator provides basic audio data preprocessing pipeline, and offer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offer -> offers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return target_dir


def create_manifest(data_dir, manifest_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What manifest mean? Add some comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

fc = paddle.layer.fc(
input=rnn_group_output,
size=dict_size + 1,
act=paddle.activation.Linear(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the activation should be softmax in inference mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code only contains the Best Path Decoder, which do not require a softmax activation.

pip install soundfile

# For Ubuntu only
apt-get install libsndfile1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to explain in the document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python部分可以改成提供一个 requires.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM.

stride_ms=10.0,
window_ms=20.0,
max_frequency=None):
self.__max_duration__ = max_duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python里面最好不要使用__XX__来定义自己的函数和变量,因为__init__、__del__等是python内置的命名方式,下同。如果定义私有的函数或变量只在前面加下划线就可以,印象中__XX是私有的,子类无法访问,_XX也是私有的但是子类可访问

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前参考paddle相关python规范如此,建议和大家讨论之后再改?

self.__stride_ms__ = stride_ms
self.__window_ms__ = window_ms
self.__max_frequency__ = max_frequency
self.__random__ = random.Random(RANDOM_SEED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否考虑开放接口给用户,让其指定随机种子?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RANDOM_SEED 不重要,不建议开放。

norm_by_times=True)
# max decoder
max_id = paddle.layer.max_id(input=fc)
return cost, max_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以根据训练或者预测分别返回cost或max_id,如果是预测的话,cost应该是非必须的吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor this part later when merging with beam search decoder.

pip install soundfile

# For Ubuntu only
apt-get install libsndfile1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python部分可以改成提供一个 requires.txt

Copy link
Collaborator

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost LGTM

"--use_gpu", default=True, type=bool, help="Use gpu or not.")
parser.add_argument(
"--use_sortagrad", default=False, type=bool, help="Use sortagrad or not.")
parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第20行已经定义了"--trainer",help信息相同,是否重复定义?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Removed.

Copy link
Contributor Author

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the review!

rnn_size=args.rnn_layer_size)

# load parameters
parameters = paddle.parameters.Parameters.from_tar(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

return target_dir


def create_manifest(data_dir, manifest_path):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


URL_TEST = "http://www.openslr.org/resources/12/test-clean.tar.gz"
URL_DEV = "http://www.openslr.org/resources/12/dev-clean.tar.gz"
URL_TRAIN = "http://www.openslr.org/resources/12/train-clean-100.tar.gz"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


forward = paddle.layer.recurrent_group(
step=__simple_rnn_step__, input=input)
return forward
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

fc = paddle.layer.fc(
input=rnn_group_output,
size=dict_size + 1,
act=paddle.activation.Linear(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current code only contains the Best Path Decoder, which do not require a softmax activation.

size=dict_size + 1,
blank=dict_size,
norm_by_times=True)
# max decoder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor this part later, together with beam search decoder.

pip install soundfile

# For Ubuntu only
apt-get install libsndfile1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if isinstance(event, paddle.event.EndPass):
result = trainer.test(reader=test_batch_reader, feeding=feeding)
print "Pass: %d, TestCost: %s" % (event.pass_id, result.cost)
with gzip.open("params.tar.gz", 'w') as f:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.

Copy link
Contributor Author

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.


class DataGenerator(object):
"""
DataGenerator provides basic audio data preprocessing pipeline, and offer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

stride_ms=10.0,
window_ms=20.0,
max_frequency=None):
self.__max_duration__ = max_duration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前参考paddle相关python规范如此,建议和大家讨论之后再改?

self.__stride_ms__ = stride_ms
self.__window_ms__ = window_ms
self.__max_frequency__ = max_frequency
self.__random__ = random.Random(RANDOM_SEED)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RANDOM_SEED 不重要,不建议开放。

norm_by_times=True)
# max decoder
max_id = paddle.layer.max_id(input=fc)
return cost, max_id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor this part later when merging with beam search decoder.

2. Fix incorrect batch-norm usage in RNN.
3. Fix overlapping train/dev/test manfests.
4. Update README.md and requirements.txt.
5. Expose more arguments to users in argparser.
6. Update all other details.
@xinghai-sun
Copy link
Contributor Author

xinghai-sun commented Jun 2, 2017

Now the model can run smoothly, with a good convergence and reasonable decoding results.

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lcy-seso lcy-seso merged commit ec9cce9 into PaddlePaddle:develop Jun 2, 2017
@xinghai-sun xinghai-sun deleted the ds2 branch June 3, 2017 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants