Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Speech 2 on PaddlePaddle: Plan & Task Breakdown #44

Closed
xinghai-sun opened this issue May 17, 2017 · 1 comment
Closed

Deep Speech 2 on PaddlePaddle: Plan & Task Breakdown #44

xinghai-sun opened this issue May 17, 2017 · 1 comment

Comments

@xinghai-sun
Copy link
Contributor

xinghai-sun commented May 17, 2017

We are planning to build Deep Speech 2 (DS2) [1], a powerful Automatic Speech Recognition (ASR) engine, on PaddlePaddle. For the first-stage plan, we have the following short-term goals:

  • Release a basic distributed implementation of DS2 on PaddlePaddle.
  • Contribute a chapter of Deep Speech to PaddlePaddle Book.

Intensive system optimization and low-latency inference library (details in [1]) are not yet covered in this first-stage plan.

Tasks

We roughly break down the project into 14 tasks:

  1. Develop an audio data provider:
  2. Create a simplified DS2 model configuration:
  3. Develop to support variable-shaped dense-vector (image) batches of input data.
  4. Develop a new lookahead-row-convolution layer (See [1] for details):
  5. Build KenLM n-gram language model for beam search decoding:
  6. Develop a beam search decoder with CTC + LM + WORDCOUNT:
  7. Develop a Word Error Rate evaluator:
    • update ctc_error_evaluator(CER) to support WER.
  8. Prepare internal dataset for Mandarin (optional):
  9. Create standard DS2 model configuration:
    • With variable-length audio sequences (need Task 3).
    • With unidirectional-GRU + row-convolution (need Task 4).
    • With CTC-LM beam search decoder (need Task 5, 6).
  10. Make it run perfectly on clusters.
  11. Experiments and benchmarking (for accuracy, not efficiency):
    • With public English dataset.
    • With internal (Baidu) Mandarin dataset (optional).
  12. Time profiling and optimization.
  13. Prepare docs.
  14. Prepare PaddlePaddle Book chapter with a simplified version.

Task Dependency

Tasks parallelizable within phases:

Roadmap Description Parallelizable Tasks
Phase I Basic model & components Task 1 ~ Task 8
Phase II Standard model & benchmarking & profiling Task 9 ~ Task 12
Phase III Documentations Task13 ~ Task14

Issue for each task will be created later. Contributions, discussions and comments are all highly appreciated and welcomed!

Possible Future Work

  • Efficiency Improvement
  • Accuracy Improvement
  • Low-latency Inference Library
  • Large-scale benchmarking

References

  1. Dario Amodei, etc., Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin. ICML 2016.
@shanyi15
Copy link
Collaborator

您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

wojtuss pushed a commit to wojtuss/models that referenced this issue Mar 4, 2019
…tion-capitest-imageclassif

extracted analyzer_image_classification test from Paddle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants