copyright

lastupdated

subcollection

years
2019, 2021

2021-03-21

speech-to-text

{:shortdesc: .shortdesc} {:new_window: target="_blank"} {:tip: .tip} {:important: .important} {:note: .note} {:deprecated: .deprecated} {:pre: .pre} {:codeblock: .codeblock} {:screen: .screen} {:javascript: .ph data-hd-programlang='javascript'} {:java: .ph data-hd-programlang='java'} {:python: .ph data-hd-programlang='python'} {:swift: .ph data-hd-programlang='swift'}

Research references

{: #references}

For more information about the research behind the {{site.data.keyword.speechtotextfull}} service, see the following documents. {{site.data.keyword.IBM}} researchers wrote or contributed to all of these papers. {: shortdesc}

{: #audhkhasi2017}Audhkhasi, Kartik, Bhuvana Ramabhadran, George Saon, Michael Picheny, and David Nahamoo. Direct Acoustics-to-Word Models for English Conversational Speech Recognition.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 959-963.
{: #audhkhasi2018}Audhkhasi, Kartik, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, and Michael Picheny. Building competitive direct acoustics-to-word models for English conversational speech recognition.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018).
{: #bahl1983}Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. A Maximum Likelihood Approach to Continuous Speech Recognition.{: external} IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5(2) (March 1983): pp. 179-190.
{: #fukuda2017}Fukuda, Takashi, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran. Efficient Knowledge Distillation from an Ensemble of Teachers.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 3697-3701.
{: #hinton2012}Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.{: external} Signal Processing Magazine, IEEE, Vol. 29(6) (November 2012): pp. 82-97.
{: #jelinek1985}Jelinek, Frederick. The Development of an Experimental Discrete Dictation Recognizer.{: external} Proceedings of the IEEE, Vol. 73(11) (November 1985): pp. 1616-1624.
{: #kurata2017a}Kurata, Gakuto, Abhinav Sethy, Bhuvana Ramabhadran, and George Saon. Empirical Exploration of Novel Architectures and Objectives for Language Models.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 279-283.
{: #kurata2017b}Kurata, Gakuto, Bhuvana Ramabhadran, George Saon, and Abhinav Sethy. Language Modeling with Highway LSTM.{: external} Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (2017).
{: #kurata2019}Kurata, Gakuto, and Kartik Audhkhasi. Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation.{: external} Accepted by Interspeech 2019.
{: #padmanabhan2002}Padmanabhan, Mukund, and Michael Picheny. Large-Vocabulary Speech Recognition Algorithms.{: external} Computer, Vol. 35(4) (2002): pp. 42-50.
{: #picheny2011}Picheny, Michael, David Nahamoo, Vaibhava Goel, Brian Kingsbury, Bhuvana Ramabhadran, Steven J. Rennie, and George Saon. Trends and Advances in Speech Recognition.{: external} {{site.data.keyword.IBM_notm}} Journal of Research and Development, Vol. 55(5) (October 2011): pp. 2:1-2:18.
{: #saon2015}Saon, George, Hong-Kwang J. Kuo, Steven Rennie, and Michael Picheny. The {{site.data.keyword.IBM_notm}} 2015 English Conversational Telephone Speech Recognition System.{: external} Submitted to Interspeech 2015 (2015).
{: #saon2016}Saon, George, Tom Sercu, Steven Rennie, and Hong-Kwang J. Kuo. The {{site.data.keyword.IBM_notm}} 2016 English Conversational Telephone Speech Recognition System.{: external} Submitted to Interspeech 2016 (2016).
{: #saon2017}Saon, George, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, and Phil Hall. English Conversational Telephone Speech Recognition by Humans and Machines.{: external} Proceedings of Interspeech 2017 (August 2017): pp. 132-136.
{: #saon2019}Saon, George, Zoltan Tuske, Kartik Audhkhasi, and Brian Kingsbury. Sequence Noise Injected Training for End-to-end Speech Recognition.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2019).
{: #soltau2014}Soltau, Hagen, George Saon, and Tara N. Sainath. Joint Training of Convolutional and Non-Convolutional Neural Networks.{: external} Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy (May 2014): pp. 5572-5576.
{: #suzuki2019}Suzuki, Masayuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, and Samuel Thomas. Improvements to N-gram Language Model Using Text Generated from Neural Language Model.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2019).
{: #thomas2019}Thomas, Samuel, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, and Michael Picheny. English Broadcast News Speech Recognition by Humans and Machines.{: external} Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

references.md

references.md

Research references

Files

references.md

Latest commit

History

references.md

File metadata and controls

Research references