-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355
Conversation
Add seqbin data parser to adapt to internal 1w data training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Almost LGTM.
deep_speech_2/data_utils/audio.py
Outdated
@@ -115,6 +117,46 @@ def slice_from_file(cls, file, start=None, end=None): | |||
return cls(data, sample_rate) | |||
|
|||
@classmethod | |||
def from_sequence_file(cls, filepath): | |||
"""Create audio segment from sequence file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some simple comments about sequence file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
deep_speech_2/data_utils/utility.py
Outdated
yield sample | ||
sample = out_queue.get() | ||
finish = 1 | ||
while finish < process_num: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we need two while loop here? The only exit condition is finish >= process_num
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can remove the first loop and set finish = 0
for Line 155, without any difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolved #354
Done:
xmap_reader_mp
to take place of multi-thread version ofpaddle.reader.xmap_reader
.This speeds up the training with the internal 1w dataset by more than 3X, and now the GPU utilization rises back to 81% (from 15%).