Skip to content

Topic: Speech Recognition

Dave Touretzky edited this page Oct 18, 2018 · 13 revisions

These are working notes for a topic area

Grade Progression: What should students know?

  • Grades K-2: Some types of devices can recognize human speech. This includes most cellphones, and home entertainment systems like Amazon's Echo or Google Home.

  • Grades 3-5: Speech recognition systems use grammatical knowledge to disambiguate homophones such as bear/bare or there/their/they're. Example: "There is no hot water" vs. "Their hot water is off" vs. "They're waiting for the hot water to come back on".

  • Grades 6-8:

  • Grades 9-12:

Readings for Working Group

  1. Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning. Adam Geitgey, Medium, December 2016. medium.com

  2. How Speech Recognition Works. Sudeesh Puthiyedath, CodeGuru.com, August 3, 2006. part 1 and part 2

  3. How Siri Works -- Interview with Tom Gruber, CTO of SIRI. Nova Spivack, NovaSpivack.com, January 26, 2010. novaspivack.com

Old Readings (replaced)

a. Brief Explanation of AI for Layman medium.com

b. Making the Leap from Speech to Dialogue: The Challenge for Human to Machine Communication medium.com

c. CACM January 2014 - A historical perspective of speech recognition. Xuedong Huang, James Baker, and Raj Reddy. Commun. ACM 57, 1 (January 2014), 94-103. DOI: acm.org

d. CACM April 2018 - Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Björn W. Schuller. Commun. ACM 61, 5 (April 2018), 90-99. DOI: doi.org/10.1145/3129340 acm.org

e. Video: Speech Emotion Recognition. youtube.com

Demo Resources

Miscellaneous concepts to incorporate

Audio -> Formants -> Phones -> Syllables -> Words -> Phrases

How neural nets improved speech recognition: use of massive training data.

Grammar: recognition does best with conversational English

"How to recognize speech" == "How to wreck a nice beach"

Languages other than English

Accents; child voices

Applications: Alexa, Siri, Cortana. What do they do? How are they useful?

Clone this wiki locally