Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355

Merged
merged 3 commits into from
Oct 9, 2017

Conversation

xinghai-sun
Copy link
Contributor

@xinghai-sun xinghai-sun commented Oct 7, 2017

Resolved #354

Done:

  1. Add multi-process version of xmap_reader_mp to take place of multi-thread version of paddle.reader.xmap_reader.
  2. Add seqbin data parser to adapt training with internal 1w English dataset.

This speeds up the training with the internal 1w dataset by more than 3X, and now the GPU utilization rises back to 81% (from 15%).

Add seqbin data parser to adapt to internal 1w data training.
Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Almost LGTM.

@@ -115,6 +117,46 @@ def slice_from_file(cls, file, start=None, end=None):
return cls(data, sample_rate)

@classmethod
def from_sequence_file(cls, filepath):
"""Create audio segment from sequence file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some simple comments about sequence file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

yield sample
sample = out_queue.get()
finish = 1
while finish < process_num:
Copy link
Contributor

@pkuyym pkuyym Oct 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need two while loop here? The only exit condition is finish >= process_num.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can remove the first loop and set finish = 0 for Line 155, without any difference.

Copy link
Contributor

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinghai-sun xinghai-sun merged commit 16a2509 into PaddlePaddle:develop Oct 9, 2017
@xinghai-sun xinghai-sun deleted the us_adapt branch October 9, 2017 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants