Skip to content

Latest commit

 

History

History
executable file
·
48 lines (41 loc) · 6.44 KB

references.md

File metadata and controls

executable file
·
48 lines (41 loc) · 6.44 KB
copyright lastupdated subcollection
years
2019, 2021
2021-03-21
speech-to-text

{:shortdesc: .shortdesc} {:new_window: target="_blank"} {:tip: .tip} {:important: .important} {:note: .note} {:deprecated: .deprecated} {:pre: .pre} {:codeblock: .codeblock} {:screen: .screen} {:javascript: .ph data-hd-programlang='javascript'} {:java: .ph data-hd-programlang='java'} {:python: .ph data-hd-programlang='python'} {:swift: .ph data-hd-programlang='swift'}

Research references

{: #references}

For more information about the research behind the {{site.data.keyword.speechtotextfull}} service, see the following documents. {{site.data.keyword.IBM}} researchers wrote or contributed to all of these papers. {: shortdesc}

  1. {: #audhkhasi2017}Audhkhasi, Kartik, Bhuvana Ramabhadran, George Saon, Michael Picheny, and David Nahamoo. Direct Acoustics-to-Word Models for English Conversational Speech Recognition.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 959-963.
  2. {: #audhkhasi2018}Audhkhasi, Kartik, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, and Michael Picheny. Building competitive direct acoustics-to-word models for English conversational speech recognition.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018).
  3. {: #bahl1983}Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. A Maximum Likelihood Approach to Continuous Speech Recognition.{: external} IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5(2) (March 1983): pp. 179-190.
  4. {: #fukuda2017}Fukuda, Takashi, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran. Efficient Knowledge Distillation from an Ensemble of Teachers.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 3697-3701.
  5. {: #hinton2012}Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.{: external} Signal Processing Magazine, IEEE, Vol. 29(6) (November 2012): pp. 82-97.
  6. {: #jelinek1985}Jelinek, Frederick. The Development of an Experimental Discrete Dictation Recognizer.{: external} Proceedings of the IEEE, Vol. 73(11) (November 1985): pp. 1616-1624.
  7. {: #kurata2017a}Kurata, Gakuto, Abhinav Sethy, Bhuvana Ramabhadran, and George Saon. Empirical Exploration of Novel Architectures and Objectives for Language Models.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 279-283.
  8. {: #kurata2017b}Kurata, Gakuto, Bhuvana Ramabhadran, George Saon, and Abhinav Sethy. Language Modeling with Highway LSTM.{: external} Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2017).
  9. {: #kurata2019}Kurata, Gakuto, and Kartik Audhkhasi. Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.{: external} Accepted by Interspeech 2019.
  10. {: #padmanabhan2002}Padmanabhan, Mukund, and Michael Picheny. Large-Vocabulary Speech Recognition Algorithms.{: external} Computer, Vol. 35(4) (2002): pp. 42-50.
  11. {: #picheny2011}Picheny, Michael, David Nahamoo, Vaibhava Goel, Brian Kingsbury, Bhuvana Ramabhadran, Steven J. Rennie, and George Saon. Trends and Advances in Speech Recognition.{: external} {{site.data.keyword.IBM_notm}} Journal of Research and Development, Vol. 55(5) (October 2011): pp. 2:1-2:18.
  12. {: #saon2015}Saon, George, Hong-Kwang J. Kuo, Steven Rennie, and Michael Picheny. The {{site.data.keyword.IBM_notm}} 2015 English Conversational Telephone Speech Recognition System.{: external} Submitted to Interspeech 2015 (2015).
  13. {: #saon2016}Saon, George, Tom Sercu, Steven Rennie, and Hong-Kwang J. Kuo. The {{site.data.keyword.IBM_notm}} 2016 English Conversational Telephone Speech Recognition System.{: external} Submitted to Interspeech 2016 (2016).
  14. {: #saon2017}Saon, George, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, and Phil Hall. English Conversational Telephone Speech Recognition by Humans and Machines.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 132-136.
  15. {: #saon2019}Saon, George, Zoltan Tuske, Kartik Audhkhasi, and Brian Kingsbury. Sequence Noise Injected Training for End-to-end Speech Recognition.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2019).
  16. {: #soltau2014}Soltau, Hagen, George Saon, and Tara N. Sainath. Joint Training of Convolutional and Non-Convolutional Neural Networks.{: external} Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy (May 2014): pp. 5572-5576.
  17. {: #suzuki2019}Suzuki, Masayuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, and Samuel Thomas. Improvements to N-gram Language Model Using Text Generated from Neural Language Model.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2019).
  18. {: #thomas2019}Thomas, Samuel, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, and Michael Picheny. English Broadcast News Speech Recognition by Humans and Machines.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).