Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the whole data preprocessor part for DeepSpeech2. #91

Merged
merged 3 commits into from
Jun 14, 2017

Conversation

xinghai-sun
Copy link
Contributor

resolve #90

  • Refactor the data preprocessor with newly added classes, e.g. AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer etc.
  • Add data augmentation interfaces and classes e.g. AugmentorBase, AugmentationPipeline, VolumePerturbAugmentor etc., to make it easier to add more data augmentation models.
  • Separate normalizer's mean-std computing from DataGenerator. Add FeatureNormalizer. -
  • Add an independent tool compute_mean_std.py for users to create mean_std file before training.
  • Re-organize data directory into datasets and data_utils.
  • Add module, class, function docs, and update README.md.

…ize dir, add augmentaion interfaces etc.).

1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer.
2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc..
3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py.
4. Re-organize directory.
Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后续觉得可以加数据处理的doc,这个过程还是挺复杂的~

@@ -86,6 +83,12 @@
help="If set None, the training will start from scratch. "
"Otherwise, the training will resume from "
"the existing model of this path. (default: %(default)s)")
parser.add_argument(
"--augmentation_config",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

真实运行的时候需要提供augmentation_config配置吗?只看到code里注释的json格式,没看到json文件,如果运行的时候需要,可否提供一个json文件,用户用时配置就可以

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个建议很好,当前augmentation_config为str格式(由于目前augmentation仅留置了接口,所以默认augmentation_config='{}',即augmentation不生效),配置json string确实不方便。
因为模型参数较多,后续可以统一提供一个config file。

Copy link
Collaborator

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@xinghai-sun xinghai-sun merged commit b1e2b23 into PaddlePaddle:develop Jun 14, 2017
@xinghai-sun xinghai-sun deleted the ds2_refactor_data branch June 14, 2017 06:58
Copy link
Contributor

@chrisxu2016 chrisxu2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

:rtype: AudioSegment
"""
samples, sample_rate = soundfile.read(file, dtype='float32')
return cls(samples, sample_rate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认只读取.wav文件吗?

:param gain: Gain in decibels to apply to samples.
:type gain: float
"""
self._samples *= 10.**(gain / 20.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议这里返回一个新建一个audio对象,方便后面添加add_noise时,复用这个方法
return type(self)(10.**(gain / 20.) * self._samples, self._sample_rate)

:return: Number of samples.
:rtype: int
"""
return self._samples.shape(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是 self._samples.shape[0], ()改为[]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor the whole data preprocessor part for DeepSpeech2.
3 participants