-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the whole data preprocessor part for DeepSpeech2. #91
Refactor the whole data preprocessor part for DeepSpeech2. #91
Conversation
…ize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续觉得可以加数据处理的doc,这个过程还是挺复杂的~
@@ -86,6 +83,12 @@ | |||
help="If set None, the training will start from scratch. " | |||
"Otherwise, the training will resume from " | |||
"the existing model of this path. (default: %(default)s)") | |||
parser.add_argument( | |||
"--augmentation_config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
真实运行的时候需要提供augmentation_config
配置吗?只看到code里注释的json格式,没看到json文件,如果运行的时候需要,可否提供一个json文件,用户用时配置就可以
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个建议很好,当前augmentation_config为str格式(由于目前augmentation仅留置了接口,所以默认augmentation_config='{}',即augmentation不生效),配置json string确实不方便。
因为模型参数较多,后续可以统一提供一个config file。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
:rtype: AudioSegment | ||
""" | ||
samples, sample_rate = soundfile.read(file, dtype='float32') | ||
return cls(samples, sample_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认只读取.wav文件吗?
:param gain: Gain in decibels to apply to samples. | ||
:type gain: float | ||
""" | ||
self._samples *= 10.**(gain / 20.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里返回一个新建一个audio对象,方便后面添加add_noise时,复用这个方法
return type(self)(10.**(gain / 20.) * self._samples, self._sample_rate)
:return: Number of samples. | ||
:rtype: int | ||
""" | ||
return self._samples.shape(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是 self._samples.shape[0], ()改为[]
resolve #90
AudioSegment
,SpeechSegment
,TextFeaturizer
,AudioFeaturizer
,SpeechFeaturizer
etc.AugmentorBase
,AugmentationPipeline
,VolumePerturbAugmentor
etc., to make it easier to add more data augmentation models.DataGenerator
. AddFeatureNormalizer
. -compute_mean_std.py
for users to create mean_std file before training.data
directory intodatasets
anddata_utils
.