Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 1.34 KB

README.md

File metadata and controls

13 lines (8 loc) · 1.34 KB

Automatic Speech Recognition (speech-to-text)

Implementation based on Listen, Attend and Spell

The Listener (encoder) is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The Speller (decoder) is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters.

las

Training objective: Predict the next phoneme in the sequence given the corresponding utterances (voice recordings) and transcripts.

Trained on the WSJ0 dataset