Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355

xinghai-sun · 2017-10-07T13:24:45Z

Resolved #354

Done:

Add multi-process version of xmap_reader_mp to take place of multi-thread version of paddle.reader.xmap_reader.
Add seqbin data parser to adapt training with internal 1w English dataset.

This speeds up the training with the internal 1w dataset by more than 3X, and now the GPU utilization rises back to 81% (from 15%).

Add seqbin data parser to adapt to internal 1w data training.

…arguments.

pkuyym

Great! Almost LGTM.

pkuyym · 2017-10-08T08:02:50Z

deep_speech_2/data_utils/audio.py

@@ -115,6 +117,46 @@ def slice_from_file(cls, file, start=None, end=None):
        return cls(data, sample_rate)

    @classmethod
+    def from_sequence_file(cls, filepath):
+        """Create audio segment from sequence file.


Please add some simple comments about sequence file.

pkuyym · 2017-10-08T08:20:03Z

deep_speech_2/data_utils/utility.py

+            yield sample
+            sample = out_queue.get()
+        finish = 1
+        while finish < process_num:


I'm not sure why we need two while loop here? The only exit condition is finish >= process_num.

Yes. We can remove the first loop and set finish = 0 for Line 155, without any difference.

…#355.

pkuyym

LGTM

Add multiprocess version of xmap_reader to speedup training.

a3e2797

Add seqbin data parser to adapt to internal 1w data training.

xinghai-sun requested review from kuke and pkuyym October 7, 2017 13:24

Set process daemon property and reset default value of num_proc_data …

0aaf93a

…arguments.

xinghai-sun force-pushed the us_adapt branch from 35c1a42 to 0aaf93a Compare October 7, 2017 14:06

pkuyym reviewed Oct 8, 2017

View reviewed changes

Update by following reviewer's comments for pull request PaddlePaddle…

efef5d9

…#355.

pkuyym approved these changes Oct 9, 2017

View reviewed changes

xinghai-sun merged commit 16a2509 into PaddlePaddle:develop Oct 9, 2017

xinghai-sun deleted the us_adapt branch October 9, 2017 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355

Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355

xinghai-sun commented Oct 7, 2017 •

edited

pkuyym left a comment

pkuyym Oct 8, 2017

xinghai-sun Oct 8, 2017

pkuyym Oct 8, 2017 •

edited

xinghai-sun Oct 8, 2017 •

edited

pkuyym left a comment

Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355

Add multi-process version of xmap_reader to take place of multi-thread version for a significant acceleration and add seqbin data parser for internal 1w data training. #355

Conversation

xinghai-sun commented Oct 7, 2017 • edited

pkuyym left a comment

Choose a reason for hiding this comment

pkuyym Oct 8, 2017

Choose a reason for hiding this comment

xinghai-sun Oct 8, 2017

Choose a reason for hiding this comment

pkuyym Oct 8, 2017 • edited

Choose a reason for hiding this comment

xinghai-sun Oct 8, 2017 • edited

Choose a reason for hiding this comment

pkuyym left a comment

Choose a reason for hiding this comment

xinghai-sun commented Oct 7, 2017 •

edited

pkuyym Oct 8, 2017 •

edited

xinghai-sun Oct 8, 2017 •

edited